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Abstract 

Potts model is a powerful tool to uncover community structure in complex networks. Here, we 
propose a new framework to reveal the optimal number of communities and stability of network 
structure by quantitatively analyzing the dynamics of Potts model. Specifically we model the 
community structure detection Potts procedure by a Markov process, which has a clear mathe¬ 
matical explanation. Then we show that the local uniform behavior of spin values across multiple 
timescales in the representation of the Markov variables could naturally reveal the network’s hierar¬ 
chical community structure. In addition, critical topological information regarding to multivariate 
spin configuration could also be inferred from the spectral signatures of the Markov process. Fi¬ 
nally an algorithm is developed to determine fuzzy communities based on the optimal number of 
communities and the stability across multiple timescales. The effectiveness and efficiency of our 
algorithm are theoretically analyzed as well as experimentally validated. 
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I. INTRODUCTION 


Community structure detection [jj-yl is a main focus of complex network studies. It has 
attracted a great deal of attention from various scientific fields. Intuitively, community refers 
to a group of nodes in the network that are more densely connected internally than with the 


re regular networks, 
for example the 


rest of the network. In the early stage, these studies were restricted to t 
Recently, inspired by several common characteristics of real networks 
scale-free property, the majority of the studies focus on networks with practical applications. 
In this meaning, community structure may provide insight into the relation between the 
topology and the function of real networks and can be of considerable use in many helds. 

A well known exploration for this problem is via the modularity concept, which is pro¬ 
posed by Newman et ah [l-3] to quantify a network’s partition. Optimizing modularity 
is effective for community structure detection and has been widely used in many real net¬ 
works. However, as pointed out by Fortunato et al modularity suffers from the resolution 
limit problem which concerns about the reliability of the communities detected through the 
optimization of modularity. In the authors claimed that the resolution limit problem 
is attributable to the coexistence of multiple scale descriptions of the network’s topologi¬ 
cal structure, while only one scale is obtained through directly optimizing the modularity. 
In addition, the dehnition of modularity only considers the signihcance of the link density 
from the static topological structure, and it is unclear how the modularity concept based 
community structure is correlated with the dynamics behavior in the network. 

Complementary to modularity concept, many efforts are devoted to understanding the 
properties of the dynamical processes taken place in the underlying networks. Specihcally, 
researchers have begun to investigate the correlation between the community structure and 
the dynamics in networks. For example. Arenas et al. pointed out that the synchronization 


reveals the topological scale in complex networks}^. In addition, the Markov process on 
a network was also extensively studied and used to uncover community structure of the 
network 0-Q . In the Markov process on a network is introduced to dehne the 

distances among network nodes, and an algorithm is then proposed to partition the network 
into communities based on these distances. In [Sj, the authors proposed to quantify and rank 
the network partitions in terms of their stability, dehned as the clustered autocovariance in 
the Markov process taken place on the network. 
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Potts dynamical model has also been applied to nncover commnnity strnctnre in networks. 


Detecting community by using Potts model 


12j, also known as the superparamagnetic clus¬ 


tering method, has been intensively studied since its introduction by Blatt et al 13|. In the 
model, the Potts spin variables are assigned to nodes of a network with community structure, 
and the interactions exist between neighboring spins. Then the structural clusters could be 
recovered by clustering similar spins in the system, which have more interactions inside 
communities than outside. The physical aspects of the method, such as its dependence on 


the dehnition of the neighbors, type o: 


the dataset, have been well studied 


interactions, number of possible states, and size of 


I^[l5l[l9|. Reichardt and Bornholdt 


16| introduced a 


new spin glass Hamiltonian with a global diversity constraint to identify proper community 
structures in complex networks. The method allows one to identify communities by mapping 
the graph onto a zero-temperature g-state Potts model with nearest-neighbor interactions. 
Recently, Li et al 17|] noticed that a lot of useful information related to community structure 
can be revealed by Potts model and the spectral characterization. Despite those excellent 
works, uncovering the dynamic of spin conhgure across multiple timescales is still a tough 
task and not yet been clearly answered. In essence, one can consider the time scale as an 
intrinsic resolution parameter for the partition: over short time scales from the beginning, 
many small clusters should be coherent; on the other hand as time evolves, there will be 
fewer and larger communities that are persistent under the dynamic of Potts model. We 
need to measure the change of the stability or robustness!^ of spin conhgure as time evolves 
and furthermore hnd some reasonable partitions at intermediate timescales. However, using 
Potts model alone is difficult to solve this problem. 

We notice that the dynamics of Markov process can naturally rehect the intrinsic prop¬ 
erties of spin dynamics with community structures and exhibit local uniform behaviors. 
However, the relationship between dynamics of Potts model and the Markov process, has 
not been well studied. In this work, using the Potts model and spin-spin correlation, we 
hrst investigate this phenomenon, and then uncover the relation between community struc¬ 
ture of a network and its meta-stability of spin dynamics, and further propose the signature 
of communities to characterize and analyze the underlying spin conhguration. For any 
given network, one can straightforwardly derive critical information related to its commu¬ 
nity structure, such as the stability of its community structures and the optimal number of 
communities across multiple timescales without using particular algorithms. It overcomes 
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the inefficiency of the classic methods, such as the resolution limitation of Modularity Q 
jh] 18|. Based on the theoretical analysis, we then develop a parameter free algorithm to 
numerically detect community structure, which is able to identify fuzzy communities with 
overlapping nodes by associating each node with a participation index that describes node’s 
involvement in each community. We also demonstrate that the algorithm is scalable and 
effective for real large scale networks. 

The outline of the paper is as follows. Section II introduces the Potts model and the 
motivation of this work. In Section III, we present a Markov stochastic model, which explains 
the relationship between spectral signatures and community structure. Section IV describes 
the critical information derived from the model, such as stability across multiple timescales 
and the optimal number of communities. Our algorithm is formally described in Section V. 
Then we give some numerical computations for some representative networks to validate the 
effectiveness and efficiency of the algorithm in Section VI . Finally, Section VII concludes 
this paper. 


II. POTTS MODEL AND SPIN-SPIN CORRELATION 


The Potts model is one of the most popular models in statistical mechanics 121]. It models 
an inhomogeneous ferromagnetic system where each data point is viewed as a marked node 
in the network. Here the mark is a cluster label, or spin value, associated with the node. The 
conhguration of the system is defined by the interactions between the nodes and controlled 
by the temperature. At low temperatures, all labels are identical (spins are aligned), which 
is equivalent to the presence of a single cluster. As temperature rises, the single cluster 
starts to split and the interactions between weakly coupled nodes gradually vanished. 

Consider an unweighted network with N nodes without self-loops, a spin conhguration 
{S'} is dehned by assigning each node i a spin label Si which may take integer values s* = 
1, • • • ,q. Suppose a system of spins can be in q different states. The Hamiltonian H{S) of 
a Potts model with this spin conhguration S is given by: 


H(S) = Y, Mi - M). (hi = 1..... N) (1) 

(o> 

where the sum is running over all neighboring nodes denoted as (ij), Jij is the interaction 
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(2) 


strength between spin i and spin j, and Ssis. is 1 if Sj = Sj, otherwise 0. is set as 


(d 






Ji3= Jji= 

where {k) is the average number of neighbors per node and dij is the Euclidean distance 
between nodes i and j. The interaction Jij is a monotonous decreasing function of dij and 
the spins Si and sj tend to have the same value as dij becomes smaller if we minimize the 
H{S). 

To characterize the coherence and correlation between two spins, spin-spin correlation 
function Cij is dehned as the thermal average of 
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15 |: 


Cij = {S.,.,} (3) 

It represents the probability that spin variables s* and Sj have the same value. Cij takes 
values from the interval [0, 1], representing the continuum from no coupling to perfect 
accordance of spins i and j. There are two phases in a homogeneous system where Jij is 
determined. At high temperatures, the system is in the paramagnetic phase and the spins 
are in disorder. Cij ~ ^ for all nodes i and j, and q is the number of possible spin values. 
At low temperatures, the system turns into the ferromagnetic phase and all the spins are 
aligned to the same direction. Cij ~ 1 holds for nodes pair i and j. 

If the system is not homogeneous but has a community structure, the states are not just 
ferromagnetic or paramagnetic. We assume that the spins will go through a hierarchy of 
local uniform states (meta-stable states), as shown in Fig{Tl before they reach a globally 
stable state with all the same value as temperature decreases. In each local uniform state, 
spin values of nodes within the same communities are identical and the whole system is 
divided into several different local regions (communities) due to the dense connections. 
Correspondingly, we can calculate the hitting and exiting time of each local uniform state to 
analyze its stability. The hitting or exiting time is the timescale that the system just enters 
or leaves this local uniform state, during which the nodes’ spin values will stably stay on 
this state. In this way we can associate the community structure with a local uniform state. 
For a well-formed community structure, each community should be cohesive, which means 
that it is easy for the nodes to hit the local uniform state. Thus, the hitting time should be 
early. At the same time, communities should stand clear from each other, which means it 
is hard for nodes to exit the local uniform state, therefore the exiting time should be late. 
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FIG. 1: Dynamics of spin configuration of four communities (A, B,C, D) when they go through 
several local uniform states to the global stable state with temperature decreasing. Different spin 
values are described by different shapes. At temperature t 4 (t 4 > > t 2 > ti, U denotes the 

temperature that i different spin states in the system), we observe four local uniform spin state 
distributions corresponding to four communities. At temperature the circle and triangle mix 
together. At t 2 , square with diamond mix together in terms of their hierarchical structure. Finally, 
at ti, only one spin state is left, in which all nodes have an identical spin distribution. 


Hence, there should be a big gap between the hitting and exiting times when a well-formed 
community structure exists. 

Once Jij has been determined, Cij can be obtained by a Monte Carlo procedure. We used 
the Swendsen-Wang (SW) algorithm 2^ because it exhibits much smaller autocorrelation 


time 


20l | than standard methods. For a network with N nodes, the SW algorithm can be 


briefly described as follows: 1. Generate initial conhguration of system Si = {si, S 2 , sn) 
randomly, where Sj is the spin value of node i randomly chosen from 1 to g, g = N/2 is 
the initial number of spin values. 2. Generate the conhguration of system S 2 based on Si: 
(a) Visit all pair of nodes < hJ > which have interaction Jij > 0, where Jij is the spin 
interaction computed only based on the adjacent network. Node i and node j are frozen 
together with probability: 


where V. o. 
^>1 


1 if Si 


p{j = 1 - exp{-^6.. 


J, 


T 


(4) 


Sj and 0 otherwise. T is the temperature. Galculate all pairs of spins 
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and put a frozen bond between any frozen pairs. (6) We define SW cluster as the cluster 
containing all spins that have a path of frozen bonds connecting all of them. Since nodes are 
frozen only if they have the same spin value, we just need to identify the SW clusters from 
the same spin values, (c) For each SW cluster, we draw a random number from l,2,...,g 
and assign this number to the values of all nodes of this cluster. After going through all SW 
clusters, the new conhguration S '2 is generated. 3. Iterate Step 2. Then we can calculate the 
value Cij. We set the initial number of possible spin values q = N/2 because if the number 
of communities is larger than q, some spin states will not be populated. For a specihc node, 
we choose a initial spin value randomly from 1 to q. 


III. A STOCHASTIC MODEL 


Markov process[2^ is a useful tool and has been applied to hnd communities js, l9|. In 
order to establish the connection between the community structure and the local uniform 
behavior of Potts model, we introduce a Markov stochastic model featured with spectral 
signatures for the network. Let A = (V, E) denote a network, where V is the set of nodes 
and E is the set of edges (or links). Consider a Markov random walk process dehned 
on A, in which a random walker freely walks from one node to another along their links. 
After arriving at one node, the walker will randomly select one of its neighbors and move 
there. Let X = Xt,t > 0, denote the walker positions, and P{Xt = f, 1 < i < N} be 
the probability that the walker hits the node i after exact t steps. For it G V, we have 
P{Xt = it\XQ = io,Xi = ii,...,Xt-i = it-i) = P{Xt = it\Xt-i = it-i). That is, the next 
state of the walker is determined only by its current state. Hence, this stochastic process is 
a discrete Markov chain and its state space is V. Furthermore, Xt is homogeneous because 
of P{Xt = j\Xt-i = i) = Pij , where ptj is the transition probability from node i to node j. 

To relate the Markov process with the patterns of Potts model, pij is dehned as 


P^J 


Ct. 




(5) 


where C^j is the spin-spin correlation function dehned in Eq.(j^ Via this representation, 
the tools of stochastic theory and hnite-state Markov processes 8|j9| can be utilized for the 
purpose of community structure analysis. Let P be the transition probability matrix, we 
have: 
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P = D-^C ( 6 ) 

where D is the diagonal degree matrix of C. Let pfj be the probability of hitting node j 
after t steps starting from node i, we have: 


= {p‘h m 

For this ergodic Markov process, P* corresponds to the probability of transitions between 
states over a period of t time steps. To compnte the transition matrix P*, the eigenvalne 
decomposition of P is used. If \k with k = 1, • • • ,N denote the eigenvalues of P, and its 
right and left eigenvectors Uk and fk are scaled to satisfy the orthonormality relation 9| : 


Ukfl — ^kh 

the spectral representation of P is given by 


( 8 ) 


P ^ ^ ^kUkfk 


(9) 


and consequently 


p^ = j2K^kfk 


( 10 ) 


We assume that eigenvalues of P are sorted such that Ai = 1 > IA 2 I > IA 3 I > ... > |A 


From the theory of spectral clustering 30 




8 l| . P* can be calculated by a sum of N matrices 


pt 


-/V 7 - p. 

xi '^nUnP 
k=l P- "■ 


( 11 ) 


each of which depends only on P’s eigensystem. This is accomplished by exploiting the fact 
that u^Dum = Inm, bccausc P is dehned by a normalized symmetric correlation matrix 
C. Because of the largest eigenvalue Ai = 1, when time t ^ 00 , P^°^ = P°° = ^ . The 

lyu\ 

convergence of every initial distribution to the stationary distribution P*^°^ corresponds to the 
fact that the spin of whole system ultimately reaches exactly the same value, as temperature 
decreases. This perspective belongs to a timescale f —)■ 00 , at which all eigenvalues \\ go to 0 
except for the largest one, Aq = 1. In the other extreme of a timescale t = 0, P^ becomes the 
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stationary distribution matrix. All of its columns are different, and the system disintegrates 
into as many spin values. 

The eigensystem of transition matrix P* can be naturally correlated with the dynamic 
process of Potts model. However, it needs preprocessing due to its asymmetrical character. 
We simply extend P* to the symmetrical form = (P* + (P*)^)/2 which is dehned as 
the spin correlation matrix at time t. The eigensystem of have the following correlation 
corresponding to Ph 

Lemma 1 The eigenvalues and corresponding eigenvectors of matrix are exactly same 
as matrix P*. 


The proof of lemma 1 is evident. From lemma 1, as owns the same eigensystem 
with P*, it can be used to unfold the dynamic of Potts states. Also, we can use G^*^ to hnd 
reasonable partitions based on many algorithms, such as the K-means algorithm and GN 
algorithm 2j . 


IV. SIGNATURES OF COMMUNITIES IN POTTS MODEL ACROSS MULTI¬ 
PLE TIMESCALES 


In this section, we will uncover the signatures of communities in Potts model across 
multiple timescales and use this to identify community structure. This scheme benehts from 
the above analysis, namely the connection between Potts model and Markov process through 
a stochastic model. A lot of useful information, such as the optimal number of communities, 
the stability of networks at arbitrary timescale, can be uncovered as follows. 

Suppose the partition method divides the network A into K clusters or sets 14 C V, /c G 
1, 2, • • ■ ,K which are disjoint and the sets 14, 14,..., Vk together form a partition of node set 
V. The number of nodes in each cluster is denoted by Nk = |I4|. Numerically we will deal 
with the dynamical process of community structure represented by the spin conhguration. 
We take the time series into consideration. Therefore, we dehne the the signature of a given 
community k by the ratio of inner correlations as 




E 

^,jeVk 


Nt 


( 12 ) 
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can be viewed as a function of timescale t and we can use it to study the trend of 
community structure as time goes on. Given the number of clusters K, the clusters are 
found by maximizing the objective function 

j{t) _ 'sp 'sp 

- 2 ^ 2 ^ ^ 

k=iijeVK 

over all partitions. The objective can be interpreted as the sum of cluster signature Sk 
for each cluster I 4 . The form of Eq. flldp is related to some famous partition measures, for 
example, is an extension of the ratio cut criterion dehned as the sum of the number of 
inter-community edges divided by the total number of edges through replacing adjacent ma¬ 
trix A by spin correlation Furthermore, is also the first part of famous modularity 
metric Q, which is widely used in the research of community detection. 

Further discussion is facilitated by reformulating the average association objective in 
matrix form. We denote the membership vector of node i hj Xi, a. probability vector that 
describes node f’s involvement in each community. The element means the fc-th entry 
of the membership vector of node i. The hard partition and disjointness of sets I 4 requires 
that the vectors Xi and Xj are orthogonal. The objective can be written in terms of the 
indicator vectors Xk as 



J 


(t) 


K 


xlG^^'>Xk 

k = l 


(14) 


The objective is to be maximized under the conditions Xk G {0,1} and xjxj = 0 if i 7 ^ j. 
Fq.(|9]) can be rewritten as a matrix trace by accumulating the vectors Uk into a matrix 
X = {xi,X 2 , ...,xk)- We can then write the objective as 


where matrix X'^X is diagonal. The substitution Y = simplihes the optimiza¬ 
tion problem to The condition is automatically satished 

since 


F^F = {X^X)-^/‘^{X^X){X^X)-^/‘^ = Ik- 


(16) 


10 




The vectors yx thus have unit length and are orthogonal to each other. The optimization 
problem can be written in terms of the matrix Y as 


max tr{Y^G^^^Y}. (17) 

YTY=I 


Lemma 2 ( Rayleigh-Ritz theorem) Let L be a symmetric NxN matrix with eigenvalues 
l = Ai>A 2 >...>AAr and the corresponding eigenvactors Ui, ...,ux- Then 

K 

max^j/jLj/fc s.t.yjyk = l (18) 

k=l 


equals and the minimum yi, ...,yK He in the subspace spanned by ui, ...,uk- 


The Rayleigh-Ritz theorem 


3l[ | tells us that the maximum for this problem is attained 


when columns of Y is the eigenvectors corresponding to the K largest eigenvalues of the 
symmetric correlation matrix We assume that eigenvalues of P are sorted such that 
Ai = 1 > IA 2 I > IA 3 I > ... > IAatI and the eigenvector corresponding to A^ is denoted as Uk- 
Then the optimal solution of Eq. lfTSj) is the matrix Y = U = {ui, ...,uk}- And the strength 
of such a cluster is equal to its corresponding t-th power of the eigenvalue 


cW _ ulG^Hu^ _ ^t^luk _ 

Ou — - ^ — 7p — Aj^ 

For the convergence of the Potts model across multiple timescales, the vanishing of the 
smaller eigenvalues as the time growing describes the loss of different spin states and the 
removal of the structural features encoded in the corresponding weaker eigenvectors. For 
the purpose of community identihcation, intermediate timescales of local uniform states are 
of interest. If we want to identify z communities, we expect to find P* at a given timescale, 
the eigenvalues A^ may be significantly different from zero only for the range k = 1, 

This is achieved by determining t such that |Afc|* ~ 0. 

From another perspective, because the eigenvalues are sorted by Ai = 1 > IA 2 I > IA 3 I > 
... > |Ajv|, the strength of a community at time t, can also be viewed as the robustness 
of fc-spin state at time t. At this point, the eigengap A^ — A^^ can be interpreted as the 
“difficulty” that the {k + l)-spin state transfer to the fc-spin state at time t. Given the 
correlation matrix G, one can measure the most suitable number of possible spins at a 
specihc time t by searching for the value k such that the eigengap A^ — A^_,_^ is maximized. 
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The number of communities A at time t is then inferred from the location of the maximal 
eigengap, and this maximal value can be used as a quality measure for the most stable state. 
The A{t) is formally dehned as 


A{t) = arg[maXk{Xi - Al+J] (20) 

From a global perspective if the number of communities A is not change for the longest time, 
we can consider it as the optimal number for this network, represented as T. 

The number of communities A may keep the same for a long time. However, the variation 
of spin conhguration hidden behind our model is still not clear. To reveal the detail of 
changes, we need to determine that the timescale of the community structure represented 
by spin conhguration is robust. To a certain extent, the most stable state can represent 
the spin conhguration of the whole network. Thus, we dehne the stability of community 
structure at each timescale, 0(t), as the stability of the most stable spin state: 


0(f) — AyY(t)+i (21) 

Our expectation is that from the trend of 0(f), one can hnd the most stable timescale for 
community structure where 0(f) reaches the maximal. At this time, the distribution of 
spin conhguration represents the most suitable community structure. Furthermore, from a 
global perspective, we can use the largest stability corresponding to q communities, F(g) = 
maa:{0(f) IA(f) = q}, to indicate the robustness of a network, dehned as the stability of 
the structure with q communities. While T{q) tries to directly characterize the network 
structure rather than a specihc network partition and thus very convenient to estimate the 
modularity property of the network. 

To show that the model can uncover hierarchical structures in diherent scales, Figj2]and 


Figj3]give two examples of the multi-level community structures. Fig, 2 (a) shows the RB125 
network, which is a hierarchical scale-free network proposed by Ravasz and Barabasi in 
2l| . The regions corresponding to 5 and 25 modules are the most representative in terms 


of resolution. Next, HIS-A proposed by Arenas et alp is shown in Fig 3(a), which is a 
homogeneous degree network with two predehned hierarchical scales. The hrst hierarchical 
level consists of 4 modules of 64 nodes and the second level consists of 16 modules of 16 
nodes. The partition of both levels are highlighted on the original networks. 
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In both examples, the most persistent A reveals the actnal nnmber of hierarchical levels 
hidden in a network. The signature of such levels can be quantihed by their corresponding 
length of persistent time. The longer the time persists, the more robust the conhguration 


is. From Fig,2(b) and Fig,3(b), we can observe 25 and 16 are the optimal numbers of 
communities in RB12b and Fri3-4 networks owning the longest persistence, respectively. 
However, 5 modules and 4 modules are also reasonable partitions which show the fuzzy level 
of the hierarchical networks. These results are in perfect consistence with the generation 
mechanisms and hierarchical patterns of these two networks. 

Furthermore, we also show that the variation tendency of stability 0(f) in the two cases 


shed a light on the spin conhguration. From Fig,2(b) and Fig,3(b), the corresponding 
stability 0(f) is not a parabolic shape for the timescales of a specihc A. Thus we cannot easily 
hnd the global optimum. However, there are some local maximal values representing better 
community structure. Thus, we can hnd these local maximal timescales corresponding to the 
desirous number of communities and apply to a specihc partition method. Furthermore, 
the stability will reach the lowest value at the end time of all A. The stability begins 
to increase when it transits to new status. One can use 0(f) to estimate the modularity 
property of complex networks, and the larger the 0 the stronger the network community 
structure. So, one can hnd the largest corresponding 0 value for a specihc number of 
community A and use it to indicate the robustness of modularity structure. For if 13-4 


shown in Fig,3(b), the stability of 16 communities structure, F(16) = 0.48 when f = 4, is 
larger than F(4) = 0.31 when f = 7. This indicates that the community structure containing 
16 modules is more robust than community structure containing 4 modules. Similarly, for 


RB12h network shown in Fig 2(b), F(25) = 0.48 corresponding to 25 communities structure 
when f = 5 is larger than F(5) = 0.31 when f = 7. The robustness of community structure 
indicated by stability F(g) favors small but obvious modules. This is the same as 6]|7| and 
is reasonable for many real networks. 

Finally, we emphasize the diherence between the stability measure proposed in this paper 
and the modularity Q proposed by Newmanjl, 3]. Q is a well-known criterion for evaluating 
a specihc partition scheme of a network. It is dehned as “the fraction of edges that fall 
within communities, minus the expected value of the same quantity if edges fall at random 


without regard for the community structure’" 


18 |, 


m 


28| . Diherent partition schemes 


will get diherent Q values for the same network, and larger ones mean better partitions. 
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(a) 



(b) 


FIG. 2: (a) Structure of RB12h, with 25 dense communities and 5 sparse communities, are high¬ 
lighted in the original network, (b) The value of A(f) and 0(t) versus time t. 




(b) 


FIG. 3: (a) Structure of i^l3-4, with 16 dense communities and 4 sparse communities, are high¬ 
lighted in the original network, (b) The value of A{t) and Q{t) versus time t. 


While our A and F try to directly characterize and evaluate the structure property based 
on network’s spectra, rather than a specific network partition. Therefore, a network only 
has exactly self-deterministic A and F values regardless of how many partition schemes 
it would have, and the larger the F the stronger the network community structure. In 
addition, Fortunato et a/[^ pointed out the resolution limit problem of the modularity Q, 
that is, there exists an intrinsic scale beyond which small qualihed communities cannot be 
detected by maximizing the modularity. However, as shown in FigJU when a clique ring 
contains cliques with different scales (i.e.,the heterogeneous community size), the intrinsic 
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FIG. 4: (a) Ring of clique network as a schematic example. Each circle corresponds to a clique, 
whose size is marked by its label C20 (contains 20 nodes) or CIO (contains 10 nodes), (b) The 
value of A{t) and 0(t) versus time t. 


community structure can be exactly revealed by A. With A and F, we can quantitatively 
compare the modularity structure of different types of complex networks. 


V. A NEW ALGORITHM TO DETECT COMMUNITY 


To actually perform the community detection, we propose an approach based on eigen¬ 


value decomposition 


29| of correlation matrix . This algorithm allows us to identify mul¬ 


tivariate communities across multiple timescales. Based on the above analysis, we correlate 
the multivariate community structure with the dynamics of the eigenvalues and eigenvectors. 

The eigenvalues and eigenvectors Uk of the symmetric and real-valued matrix can 
be obtained by solving the eigenvalue equation 


■Uk = \\-Uk,k = l,...,N (22) 

which has N different solutions when time t is small. Assume that the eigenvectors 
are normalized (YlnUkii) = 1). Each signature Skit) = is associated with a specihc 
community (the elements in the vector have the same spin value) and quantihes its strength 
at a given timescale. For each community k, the internal structure is described by the 
corresponding eigenvector Uk- After normalization = 1), its components quantify 

the relative involvement of each node i to community k by u\{i). Combining the signature 
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of the community and the index u\{i), the “absolute” involvement of node i in a community 
k at time t can be described by the following participation index, 


(23) 

Node i is considered as belonging to community k when the participation index becomes 
maximal. 

From Eq.(j23]), we observe that participation index evolves as time goes on. When the 
timescale f —>■ cx), all eigenvalues \\ approach to 0 except the largest one, Af, = 1. At 
this time, all nodes belong to the same community according to the participation index 
dehnition. In the other extreme when t = 0, the participation matrix R actually becomes 
the eigenvector matrix f/^. All of its columns are different, and the number of communities 
is equal to the dimension of the matrix. Here we are interested in the optimal partition at an 
intermediate timescale with large stability 0(t), when the spin conhguration represents the 
most robust community structure. So, we hrst determine the optimal number of communities 
by using A across long time t. Then, we pick up the timescale t that the stability 0(t) is 
maximal between and A{t) equals to the optimal number of communities. In many real 
networks, the formulation of communities is a hard partition and each node belongs to 
only one cluster after the cluster. This is often too restrictive for the reason that nodes 
at the boundary among communities share commonalities with more than one community 
and play a role of transition in many diffusive networks. In our work, the participation 
index R motivates the extension of the partition to a probabilistic setting. It is extended 
to the fuzzy partition concept where each node maybe long to different communities with 
nonzero probabilities at the same time and more reasonable for the real world. Finally, we 
calculate the participation index at the most stable time t. The framework of the whole 
process is summarized in Algorithm [1] In the process of the algorithm, calculate the spin- 
spin correlation matrix C is based on SW algorithm and costs less than 0(A^^). It is easy to 
estimate the computational cost of the algorithm is main on the calculation of eigensystem 
of G and for sparse graphs, it is about 0{N'^). Other steps of the process are some simple 
matrix computations. So hnally, we obtain the cost of Algorithm [1] is 0{N^). Our algorithm 
is a parameter free method and very easy to implement in real networks. 
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Algorithm 1 Framework of our new algorithm. 

Input: 

The adjacent matrix of the network A; 

Output: 

1: Calculate the spin-spin correlation matrix C. 

2: Calculate the Markov transition probability matrix P and G based on C. 

3: Calculate the eigenvalues and corresponding eigenvectors of G. 

4: Find the optimal number of communities K and corresponding times t with the largest stability. 
5: Calculate the participation index R according to Eg. (123)1 . 

6: Return: Output the participation index R] 


VI. EXPERIMENTS 

In this section, we will benchmark the performance of our algorithm. We designed and 
implemented three experiments for two main purposes: (1) to evaluate the accuracy of the 
algorithm; (2) to apply it to real large-scale networks. 


Benchmark network 


We empirically demonstrate the effectiveness of our algorithm through comparison with 


hms 


other hve well-known algorithms on the artihcial benchmark networks. These algorit 
include: Newman’s fast algorithm[l|, Danon et ah’s method 321. the Louvain method 
Infomap[l^. and the clique percolation method 271. We utilize widely used Ad-Hoc network 


m, 


model, which can produce a randomly synthetic network containing 4 predefined commu¬ 
nities and each has 32 nodes. The average degree of nodes is 16, and the ratio of intra¬ 
community links is denoted as Pin- As Pin decreases, the community structures of Ad-Hoc 
networks become more and more ambiguous, and correspondingly, their r(4) values climb 


from 0 to 1, as shown in Fig. 5 (a 


We use the normalized mutual information (NMI) measure 3^ to qualify the partition 
found by each algorithm. We ask the question whether the intrinsic scale can be correctly 


uncovered. The experimental results are illustrated in Fig, 5(b), where y-axis represents NMI 


value, and each point in curves is obtained by averaging the values obtained on 50 synthetic 
networks. As we can see, all algorithms work well when 1 — /i is more than 0.7 with NMI 
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FIG. 5: (a) r(4) values of networks versus different Pin- (b) Comparison of accuracy of our 
algorithm with other five existing algorithms. 


larger than 0.85. Compared with other hve algorithms, our algorithm performs the best. Its 
accuracy is only slightly worse than that of the clique percolation when 0.5 < 1 — fi < 0.65. 
However, the complexity of the clique percolation is more than O(n^) and nearly the same 
as the time consuming Breadth First Search(BFS). By contrast, the time complexity of 
our method is very low(0(n^)) and can be easily implemented. 


B. US Football network 


The United States college football team network has been widely used as a benchmark 

nn 

example [ij [28 Mue to its natural community structure. We used the data gathered by Girvan 
and Newman l|]. It is a representation of the schedule of Division I American Football games 
in the 2000 season in USA. The nodes in the network represent the 115 teams, while the 
edges represent 613 games played in the course of the year. The whole network can be 
naturally divided into 12 distinct groups. As a result, games are generally more frequent 
between members of the same group than between members of different groups. 

First, we calculate A and the corresponding stability 0 and the results are illustrated in 
Figini Results show that the optimal number of communities is A = 12, which perfectly agree 
with the true situation. The stability 0 reaches F(12) = 0.31 when t = 4. Then we apply 
our algorithm to the football team network and partitions the network into 12 communities, 
which is shown in FigJTl The correct rate of our method is more than 93%, which means 
that the detected community structure is in a high agreement with the true community 
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FIG. 6: Computational results of A(t) and 0(t) on US football network. 
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FIG. 7: Computational results of our algorithm on the football team network. The nodes with 
the same shapes and colors are teams in the same group, and the dense subgraphs in the lay¬ 
out are communities detected by the algorithm. Four fuzzy overlapping nodes are described as 
independents. 

structure. Actually, methods based on optimization of modularity Q usually can just find 
11 communities and the correct rate is low due to the fuzziness of the network. We concludes 
that the ability of our method to reveal a natural characteristic is valuable for many real 
networks. Furthermore, our algorithm has identihed 5 interesting overlapping nodes which 
described as yellow triangles. The nodes are all fuzzily he at the boundary communities and 
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can be viewed as some relative independent clnbs which can be interpreted readily by the 
hnman eye. 


C. Scientific collaboration network 


Finally we tested onr algorithm on a 
network, collected by Girvan and Newman 


arg e-scale network, the scientific collaboration 


22j . The network illnstrates the research collab¬ 


orations among 56,276 physicists in terms of their coanthored papers posted on the Physics 
E-print Archive at arxiv.org, Totally, this network contains 315,810 weighted edges. For 
visnalization pnrpose, onr algorithm ontpnts a transformed adjacency matrix (in which the 
nodes within the same commnnity are gronped together) with a hierarchical commnnity 


strnctnre. From the transformed matrix of Figs,8(a), one can observe a qnite strong com¬ 


mnnity structure, or a group-oriented collaboration pattern. Among these physicists, three 
biggest research communities are self organized regarding to three main research fields: con¬ 
densed matter, high-energy physics (including theory, phenomenology and nuclear), and 
astrophysics. 


The cumulative distribution of community sizes in power plot is shown in Fig, 8(b) and it is 


a typical scale-free distribution which exists broadly in real world. In total, 737 communities 
were detected by the optimal community stability, the maximum size of those communities 
is 195, the minimum size is 2, and the average size is 76. Among these communities, 
1,433 of 6,931 pairs of communities have fuzzy participation index with each other. 5% 
largest communities contain 25.4% of the nodes, while the others are relatively small. The 
three largest communities correspond closely to research subareas. The largest is solid-state 
physics, the second largest is super-nuclear physics, and the third is theoretical astrophysics. 


Furthermore, a subnetwork including eight communities in is shown in Fig, 8(c) and four 


regions including 10 overlapping nodes are highlighted by four circles, which were detected 
according to the partici pat ion index R. The partition result is completely the same as the 


results in Refs. 


22| and 28(]. The efficient performance in large real network indicates that 


our method is useful for further researches in various fields. 
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(c) 


FIG. 8: (a) Transformed adjacency matrix of the scientific collaboration network, (b) Distribution 
of community sizes in a linear plot, (c) Subnetwork including eight communities illustrated in 
different shapes and colors and 10 overlapping nodes enclosed by four circles. 


VII. CONCLUSION 


In summary, we have presented a more theoretically-based community detection frame¬ 
work which is able to uncover the connection between network’s community structures and 
spectrum properties of Potts model’s local uniform state. We demonstrate that important 
information related to community structures can be mined from a network’s spectral signa¬ 
tures through a Markov process computation, such as the stability of modularity structures 
and the optimal number of communities. Based on theoretical analysis, we further devel¬ 
oped an algorithm to detect fuzzy community structure. Its effectiveness and efficiency have 
been demonstrated and verihed through both the simulated networks and the real large-scale 
networks. 
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